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Tamper Resistant Software Encoding and Analysis 

The present invention relates generally to computer software, and more 
specifically, to a method and system of making computer software resistant to 
5 tampering and reverse-engineering. 

Background of the Invention 

The market for computer software in all of its various forms is recognized to 
be very large and is growing everyday. In industrialized nations, hardly a business 

1 0 exists that does not rely on computers and software either directly or indirectly, in 
their daily operations. As well, with the expansion of powerful communication 
networks such as the Internet, the ease with which computer software may be 
exchanged, copied and distributed is also growing daily. 

With this growth of computing power and communication networks, a user's 

1 5 ability to obtain and run unauthorized or unlicensed software is becoming less and 
less difficult, and a practical means of protecting such computer software has yet to 
be devised. 

Computer software is generally written by software developers in a high-level 
language which must be compiled into low-level object code in order to execute on a 
20 computer or other processor. 

High-level computer languages use command wording that closely mirrors 
plain language, so they can be easily read by one skilled. in the art. Object-code 
generally refers to machine-executable code, which is the output of a software 
compiler that translates source code from human-readable to machine-executable 
25 code. 

The low-level structure of object code refers to the actual details of how the 
program works. Low-level analysis usually focuses on, or at least begins with, one 
routine at a time. This routine may be, for example, a procedure, function or 
method. Analysis of individual routines may be followed by analyses of wider scope 
30 in some compilation tool sets. 

The low-level structure of a software program is usually described in terms of 
its data flow and control flow. Data-flow is a description of the variables together 
with the operations performed on them. Control-flow is a description of how control 
jumps from place to place in the program during execution, and the tests that are 
35 performed to determine those jumps. 

Tampering refers to changing computer software in a manner that is against 
the wishes of the, original author. Traditionally, computer software programs have 

SUBSTITUTE SHEET (RULE 26) 



WO 02/095546 



PCT/CA02/00754 



-2- 

had limitations encoded into them, such as requiring password access, preventing 
copying, or allowing the software only to execute a predetermined number of times 
or for a certain duration. However, because the user has complete access to the 
software code, methods have been found to identify the code administering these 
5 limitations. Once this coding has been identified, the user is able to overcome these 
programmed limitations by modifying the software code. 

Since a piece of computer software is simply a listing of data bits, ultimately, 
one cannot prevent attackers from making copies and making arbitrary changes. As 
well, there is no way to prevent users from monitoring the computer software as it 
1 0 executes. This allows the user to obtain the complete data-flow and control-flow, so 
it was traditionally thought that the user could identify and undo any protection. This 
theory seemed to be supported in practice. This was the essence of the copy- 
protection against hacking war that was common on Apple-ll and early PC software, 
and has resulted in these copy-protection efforts being generally abandoned. 
1 5 Since then, a number of attempts have been made to prevent attacks by 

"obfuscating" or making the organisation of the software code more confusing and 
hence, more difficult to modify. Software is commercially available to "obfuscate" 
source in code in manners such as: 

globally replacing variable names with random character strings. For 
20 example, each occurrence of the variable name "SecurityCode" could be 

replaced with the character string "1xcd385mxc" so that it is more difficult for 

an attacker to identify the variables he is looking for; 

deleting comments and other documentation; and 

removing source-level structural indentations, such as the indentation of loop 
25 bodies, to make the loops more difficult to read. 

While these techniques obscure the source code, they do not make any 
attempts to deter modification. These methods produce superficial changes, but the 
information exposed by deeper analyses employed by optimizing compilers and 
similar sophisticated tools is changed very little. The data flow and control flow 
30 information exposed by such analyses is either not affected at all, or is only slightly 
affected, by the above methods of obfuscation. Once the attacker has figured out 
how the code operates, he is free to modify it as he choses. 

A more complex approach to obfuscation is presented in issued United 
States Patent No. 5,748,741 which describes a method of obfuscating computer 
35 software by artificially constructing a "complex wall". This "complex wall" is 
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preferably a "cascade" structure, where each output is dependent on all inputs. The 
original program is protected by merging it with this cascade, intertwining the two. 
The intention is to make it very difficult for the attacker to separate the original 
program from the complex wall again, which is necessary to alter the original 
5 program. This system suffers from several major problems: 

° large code expansion, exceeding a hundred fold, required to create a 

sufficiently elaborate complex wall, and to accommodate its intertwining with 

the original code; and 
o low security since the obfuscated program may be divided into manageable 
10 blocks which may be de-coded individually, allowing the protection to be 

removed one operation at a time. 

Other researchers are beginning to explore the potential for obfuscation in 
ways far more effective than what is achieved by current commercial code 
obfuscators, though still inferior to the obfuscation of issued United States Patent 
1 5 No. 5,748,741 . For example, in their paper "Manufacturing cheap, resilient, and 
stealthy opaque constructs", Conference on Principles of Programming Languages 
(POPL), 1998 [ACM 0-89791-979-3/98/01], pp. 184-196, C. Collburg, C. 
Thomborson, and D. Low propose a number of ways of obscuring a computer 
program. In particular, Collburg et al. disclose obscuring the decision process in the 
20 program, that is, obscuring those computations on which binary or multiway 
conditional branches determine their branch targets. Clearly, there are major 
deficiencies to this approach, including: 

because only control-flow is being addressed, domain transforms are not 
used and data obfuscation is weak; and 
25 ° there is no effort to provide tamper-resistance. In fact, Collburg et al. do not 
appear to recognize the distinction between tamper-resistance and 
obfuscation, and as a result, do not provide any tamper-proofing at all. 
The approach of Collburg et al. is based on the premise that obfuscation can 
not offer a complete solution to tamper protection. Collburg et al. state that: "... code 
30 obfuscation can never completely protect an application from malicious reverse- 
engineering efforts. Given enough time and determination, Bob will always be able 
to dissect Alice's application to retrieve its important algorithms and data structures." 

A software approach for computing with encrypted data is described by Niv 
Ahituv, Yeheskel Lapid, and Seev Neumann, in Processing encrypted data, 
35 Communications of the ACM 30(9), Sept 1987, pp. 777-780. This method hides the 
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actual value of the data from the software doing the computation. However, the 
computations which are practical using this technique are quite restricted. 

In Breaking abstractions and unstructuring data structures, IEEE International 
Conference on Computer Languages, 1998, Christian Collberg, Clark Thomborson, 
5 and Douglas Low provide more comprehensive proposals on obfuscation, together 
with methods for obfuscation of structured and object-oriented data. 

There remains a weakness, however, in the methods proposed by Ahituv et 
al. and Collberg et al. Obfuscation and tamper-resistance are distinct problems, and 
while weak obfuscation is provided by Ahituv et al. and Collberg et al., they do not 

1 0 address tamper resistance at all. For example, consider removing password 

protection from an application by changing the password decision branch from a 
conditional one to an unconditional one. Plainly, this vulnerability cannot be 
eliminated effectively by any amount of mere obfuscation. A patient attacker tracing 
the code will eventually find the "pass, friend" / "begone, foe" branch instruction. 

1 5 Identifying this branch instruction allows the attacker to circumvent a protection 

routine by simply re-coding it to a non-conditional branch. Therefore, other methods 
are required to avoid such single points of failure. 

The level of obfuscation obtained using the above techniques is plainly quite 
weak, since the executed code, control flow and data flow analysed in graph form, is 

20 either isomorphic to, or nearly isomorphic to, the unprotected code. That is, 

although the details of the obfuscated code are different from the original code, the 
general organisation and structure have not changed. 

As noted above, it is desirable to prevent users from making small, 
meaningful changes to computer programs, such as overriding copy protection and 

25 timeouts in demonstration software. It is also necessary to protect computer 
software against reverse engineering which might be used to identify valuable 
intellectual property contained within a software algorithm or model. In hardware 
design, for example, vendors of application specific integrated circuit (ASIC) cell 
libraries often provide precise software models corresponding to the hardware, so 

30 that users can perform accurate system simulations. Because such a disclosure 
usually provides sufficient detail to reveal the actual cell design, it is desirable to 
protect the content of the software model. 

In other applications, such as emerging encryption and electronic signature 
technologies, there is a need to hide secret keys in software programs and 

35 transmissions, so that software programs can sign, encrypt and decrypt transactions 
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and other software modules. At the same time, these secret keys must be protected 
against being leaked. 

There is therefore a need for a method and system of making computer 
software resistant to tampering and reverse engineering. This design must be 
5 provided with consideration for the necessary processing power and real time delay 
to execute the protected software code, and the memory required to store it. 

Summary of the Invention 

It is therefore an object of the invention to provide a method and system of 
1 0 making computer software resistant to tampering and reverse engineering which 

addresses the problems outlined above. 

The method and system of the invention recognizes that attackers cannot be 

prevented from making copies and making arbitrary changes. However, the most 

significant problem is "useful tampering" which refers to making small changes in 
1 5 behaviour. For example, if the trial software was designed to stop working after ten 

invocations, tampering that changes the "ten" to "hundred" is a concern, but 

tampering that crashes the program totally is not a priority since the attacker gains 

no benefit. 

Data-flow describes the variables together with operations performed on 
20 them. The invention increases the complexity of the data-flow by orders of 

magnitude, allowing "secrets" to be hidden in the program, or the algorithm itself to 
be hidden. "Obscuring" the software coding in the fashion of known code 
bbfuscators is not the primary focus of the invention. Obscurity is necessary, but not 
sufficient for, achieving the prime objective of the invention, which is 
25 tamper-proofing. 

One aspect of the invention is broadly defined as a method of increasing the 
tamper-resistance and obscurity of computer software code comprising the steps of: 
proposing a set of possible encoding techniques; calculating the number of possible 
solutions that would correspond to each of said set of possible encoding techniques; 
30 and encoding said target program using the encoding technique that results in the 
greatest number of possible solutions. 

The Applicant has several pending patent applications describing various 
techniques for converting computer software into tamper-resistant form. While it is 
understood that these techniques could be applied in combination with one another, 
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the synergy that certain combinations would offer was not clear until the analysis 
technique of the invention was conceived and applied. 

Once these combinations were investigated further, it was also found that 
certain improvements could be made to their implementations, which went beyond 
5 the initial teachings. 

One exceptionally effective technique is broadly defined as a combination of 
linear and residue number encoding (described herein as "alternative mixed 
encoding"). Another exceptionally effective technique is described as multinomial 
encoding. 

10 

Brief Description of the Drawings 

These and other features of the invention will become more apparent from 
the following description in which reference is made to the appended drawings in 
which: 

1 5 Figure 1 presents an exemplary computer system in which the invention may be 
embodied; 

Figure 2 presents a flow chart of a general algorithm for implementation of the 
invention; 

Figure 3 presents a flow chart of a polynomial encoding routine in an embodiment of 
20 the invention; 

Figure 4 presents a flow chart of a residue number encoding routine in an 

embodiment of the invention; 
Figure 5 presents a flow chart of a routine for analysing the effectiveness of 

particular tamper-resistant techniques in an embodiment of the invention; 
25 Figure 6 presents a flow chart of a routine for applying the multinomial encoding 

technique in an embodiment of the invention; and 
Figure 7 presents a flow chart of a routine for applying the alternative mixed 

encoding technique in an embodiment of the invention. 

30 Detailed Description of Preferred Embodiments of the Invention 

The invention lies in a means for recoding software code in such a manner 
that it is fragile to tampering. Attempts to modify the software code will therefore 
cause it to become inoperable in terms of its original function. The tamper-resistant 
software may continue to run after tampering, but no longer performs sensible 
35 computation. 
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The extreme fragility embedded into the program by means of the invention 
does not cause execution to cease immediately, once it is subjected to tampering. It 
is desirable for the program to continue running so that, by the time the attacker 
realizes something is wrong, the modifications and events which caused the 
5 functionality to become nonsensical are far in the past. This makes it very difficult 
for the attacker to identify and remove the changes that caused the failure to occur. 

As a matter of background, an exemplary system on which the invention can 
be implemented, will first be presented with respect to Figure 1. Next, several 
techniques which are presented in co-pending patent applications will then be 
10 described with respect to Figures 2 through 4. These techniques, particularly linear 
encoding and residue number encoding, form the foundation for the new techniques 
of the invention. 

An example of a system upon which the invention may be performed is 
presented as a block diagram in Figure 2. This computer system 10 includes a 

15 display 12, keyboard 14, computer 16 and external devices 18. 

The computer 16 may contain one or more processors or microprocessors, 
such as a central processing unit (CPU) 20. The CPU 20 performs arithmetic 
calculations and control functions to execute software stored in an internal memory 
22, preferably random access memory (RAM) and/or read only memory (ROM), and 

20 possibly additional memory 24. The additional memory 24 may include, for example, 
mass memory storage, hard disk drives, floppy disk drives, magnetic tape drives, 
compact disk drives, program cartridges and cartridge interfaces such as those 
found in video game devices, removable memory chips such as EPROM or PROM, 
or similar storage media as known in the art. This additional memory 24 may be 

25 physically internal to the computer 16, or external as shown in Figure 1. 

The computer system 10 may also include other similar means for allowing 
computer programs or other instructions to be loaded. Such means can include, for 
example, a communications interface 26 which allows software and data to be 
transferred between the computer system 10 and external systems. Examples of 

30 communications interface 26 can include a modem, a network interface such as an 
Ethernet card, a serial or parallel communications port. Software and data 
transferred via communications interface 26 are in the form of signals which can be 
electronic, electromagnetic, optical or other signals capable of being received by 
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communications interface 26. Multiple interfaces, of course, can be provided on a 
single computer system 10. 

Input and output to and from the computer 16 is administered by the 
input/output (I/O) interface 28. This I/O interface 28 administers control of the 
5 display 12, keyboard 14, external devices 18 and other such components of the 
computer system 10. 

The invention is described in these terms for convenience purposes only. It 
would be clear to one skilled in the art that the invention may be applied to other 
computer or control systems 10. Such systems would include all manner of 
10 appliances having computer or processor control including telephones, cellular 

telephones, televisions, television set top units, point of sale computers, automatic 
banking machines, lap top computers, servers, personal digital assistants and 
automobiles. 

1 5 Compiler Technology 

In the preferred embodiment, the invention is implemented in terms of an 
intermediate compiler program running on a computer system 10. Standard 
compiler techniques are well known in the art, and will not be reviewed in detail 
herein. Two standard references which may provide necessary background are 

20 "Compilers Principles, Techniques, and Tools" 1988 by Alfred Aho, Ravi Sethi and 
Jeffrey Ullman (ISBN 0-201-1008-6), and "Advanced Compiler Design & 
Implementation" 1997 by Steven Muchnick (ISBN 1-55860-320-4). The preferred 
embodiment of the invention is described with respect to static single assignment, 
which is described in Muchnick. 

25 Generally, a software compiler is divided into three components, described as 

the front end, the middle, and the back end. The front end is responsible for 
language dependent analysis, while the back end handles the machine-dependent 
parts of code generation. Optionally, a middle component may be included to 
perform optimizations that are independent of language and machine. Typically, 

30 each compiler family will have only one middle, with a front end for each high-level 
language and a back end for each machine-level language. 

All of the components in a compiler family can generally communicate in a 
common intermediate language so they are easily interchangeable. This 
intermediate language is generally in a form which exposes both control- and data- 
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flow so that they are easily manipulated. Such an intermediate form may be referred 
to as flow-exposed form. 

In the preferred embodiment of the invention, it is the intermediate code that 
will be manipulated to make the desired areas of the input software tamper-resistant. 
5 The invention can most easily be applied to software code in Static Single 

Assignment (SSA) form. SSA is a well-known, popular and efficient flow-exposed 
form used by software compilers as a code representation for performing analyses 
and optimizations involving scalar variables. Effective algorithms based on SSA 
have been developed to address constant propagation, redundant computation 
10 detection, dead code elimination, induction variable elimination, and other 
requirements. 

Of course, the method of the invention could be applied to flow-exposed 
forms other than SSA, where these provide similar levels of semantic information, as 
in that provided in Gnu CC. Gnu'CC software is currently available at no cost from 
15 the Free Software Foundation. 

Similarly, the method of the invention could be applied to software in its high 
level or low level forms, if such forms were augmented with the requisite control- and 
data-flow information. This flexibility will become clear from the description of the 
encoding techniques described hereinafter. 

20 

General Implementation of Tamper-Resistant Compiling 

In general, the tamper-resistant encoding techniques of the invention may be 
implemented as shown in Figure 2. 

To begin with, high level code can be converted to intermediate form at step 
25 30, using an appropriate compiler front end. Any desirable code optimization should 
then be performed at step 32. Code optimization would generally be ineffective if 
implemented after the tamper-resistant encoding, as the tamper-resistant encoding 
is deliberately designed to frustrate simplification and organization. 

The tamper-resistant encoding is now performed in three passes of the 
30 intermediate code graph for each phase of encoding, shown in Figure 2 as steps 38 
through 46. In the preferred embodiment of the invention, the popular practice of 
dividing the compiler into a number of phases, several dozen, in fact, is being 
followed. Each phase reads the SSA graph and does only a little bit of the encoding, 
leaving a slightly updated static single assignment graph. This makes it easier to 
35 understand and to debug. A "phase control file" may be used to specify the ordering 
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of the phases at step 38 and particular parameters of each phase, for added 
flexibility in ordering phases. This is particularly useful when one phase is to be 
tested by inserting auditing phases before and/or after it, or when debugging options 
are added to various phases to aid debugging. 
5 Whenever variable codings are chosen, three passes of the intermediate 

code graph are generally required. In a first pass, at step 40, the tamper-resistant 
encoding compiler 34 walks the SSA graph and develops a proposed system of re- 
codings. If the proposed codings are determined to be acceptable at step 42, which 
may require a second pass of the SSA graph, control proceeds to step 44, where the 

1 0 acceptable re-codings are then made in a third pass. If the proposed coding is found 
to contain mismatches at step 42, then recodings are inserted as needed to 
eliminate the mismatches at step 46. 

Once all of the encoding phases have been executed, the resulting tamper- 
resistant intermediate code is then compiled into object code for storage or machine 

1 5 execution by the compiler back end 48. 

The tamper-resistant techniques described hereinafter, would generally be 
implemented at step 40 of such a routine. 

Before considering the new analysis and tamper-resistant encoding 
techniques, the polynomial (or linear) and residue number techniques described in 

20 earlier patent applications should be reviewed. 

Polynomial Coding 

The polynomial encoding technique takes an existing set of equations and 

produces an entirely new set of equations with different variables. The variables in 
25 the original program are usually chosen to have meaning in the real world, while the 

new encoded variables will have no such meaning. As well, the clever selection of 

constants and polynomials used to define the new set of equations may allow the 

original mathematical operations to be hidden. 

This technique represents a variable x by some polynomial of x, such as ax + 
30 b where a and b are some random numbers. This technique allows us to hide 

operations by changing their sense, or to distribute the definition of a variable around 

in a program. 

A convenient way to describe the execution of the polynomial routine is in 
terms of a "phantom parallel program". As the polynomial encoding routine executes 
35 and encodes the original software program, there is a conceptual program running in 
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parallel, which keeps track of the encodings and their interpretations. After the 
original software program has been encoded, this "phantom parallel program" adds 
lines of code which "decode" the output back to the original domain. 

For example, if the SSA graph defines the addition of two variables as: 



5 z := x-y (1) 

this equation may be hidden by defining new variables: 

x' := ax + 6 (2) 

y' := cy +d (3) 

z' := ez+f (4) 



1 0 Next, a set of random values for constants a, b, c, d, e, and f is chosen, and the 
original equation (1) in the software program is replaced with the new equation (5). 
Note that, in this case, the constant c is chosen to be equal to -a, which hides the 
subtraction operation from equation (1) by replacing it with an addition operation: 

z' := x'+y' (5) 
1 5 The change in the operation can be identified by algebraic substitution: 

z' := a(x-y) + (b + d) (6) 

Equation (5) is the equation that will replace equation (1) in the software 
program, but the new equations (2), (3) and (4) will also have to be propagated 
throughout the software program. If any conflicts arise due to mismatches, 
20 RECODE operations will have to be inserted to eliminate them. 

In generating the tamper-resistant software, the transformations of each 
variable are recorded so that all the necessary relationships can be coordinated in 
the program as the SSA graph is traversed. However, once all nodes of the SSA 
graph have been transformed and the "decoding" lines of code added at the end, the 
25 transformation data may be discarded, including equations (3), (4) and (5). That is, 
the "phantom parallel program" is discarded, so there are no data left which an 
attacker may use to reverse engineer the original equations. 

Note that a subtraction has been performed by doing an addition without 
leaving a negative operator in the encoded program. The encoded program only has 
30 a subtraction operation because the phantom program knows "c = -a". If the value of 
the constant had been assigned as "c = a", then the encoded equation would really 
be an addition. Also, note that each of the three variables used a different coding 
and there was no explicit conversion into or out of any encoding. 

For the case of: 
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one could chose: 

x' := ax + b t and (8) 
/ := (-a)y+D (9) 
which would cause the negation operation to vanish, and x and y to appear to be the 
5 same variable. The difference is only tracked in the interpretation. 
Similarly, for the case of : 

y := x + 5 (10) 
one could chose: 

y' := ax + (6 + 5) (11) 
10 causing the addition operation to vanish. Again, now there are two different 
interpretations of the same value. 

Figure 3 presents a simple implementation of the polynomial coding 
technique. At step 58, a fragment of code from the SSA graph is analysed to 
determine whether it defines a polynomial equation suitable for polynomial encoding. 
1 5 If so, a suitable set of polynomial equations is defined at step 60 that accomplishes 
the desired encoding. As noted above, this technique is generally applied to 
physically distribute the definition of a variable throughout a program so a single 
assignment is usually replaced by a system of assignments distributed throughout 
the program. 

20 For the simple polynomial scheme, the values of constants are generally 

unrestricted and the only concern is for the size of the numbers. Values are chosen 
which do not cause the coded program to overflow. In such a case, the values of 
constants in these equations may be selected randomly at step 62, within the 
allowable constraints of the program. However, as noted above, judicious selection 

25 of values for constants may be performed to accomplish certain tasks, such as 
inverting arithmetic operations. 

At the decision block of step 64 it is then determined whether the entire SSA 
graph has been traversed, and if not, the compiler steps incrementally to the next 
code fragment by means of step 66. Otherwise, the phase is complete. 

30 Variations on this technique would be clear to one skilled in the art. For 

example, higher order polynomials could be used, or particular transforms developed 
to perform the desired hiding or inversion of certain functions. 
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Residue Number Coding 

This technique makes use of the "Chinese Remainder Theorem" and is 
usually referred to as "Residue Numbers" in text books (see "The Art of Computer 
Programming", volume 2: "Seminumerical Algorithms", 1997, by Donald E. Knuth, 
5 ISBN 0-201-89684-2, pp. 284-294, or see "Introduction to Algorithms", 1990, by 

Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest, ISBN 0-262-03141- 
8, pp. 823-826). A "base" is chosen, consisting of a vector of pairwise relatively 
prime numbers, for example: 3, 5 and 7. Then, each variable x is represented as a 
vector of remainders when this variable is operated upon by the "base", that is, x 
1 0 maps on to (x rem 3, x rem 5, x rem 7). 

In this scheme, a "Modular Base" consists of several numbers that are 
pairwise relatively prime. Two distinct integers are said to be relatively prime if their 
only common divisor is 1 . A set of integers are said to be pairwise relatively prime, if 
for each possible distinct pair of integers from the set, the two integers of the pair are 
1 5 relatively prime. 

An example of such a set would be {3, 5, 7}. In this base, integers can be 
represented as a vector of remainders by dividing by the base. For example: 

0 = (0, 0, 0), 

1 = (1,1,1), 
20 5 (2,0,5), 

100 = (1,0, 2), and 
105 = (0,0,0). 

Note that this particular base {3, 5, 7} has a period of 105, which is equal to 
the product of 3 * 5 x 7, so that only integers inside this range may be represented. 
25 The starting point of the range may be chosen to be any value. The most useful 
choices in this particular example would be [0, 104] or [-52, 52]. 

If two integers are represented in the same base, simple arithmetic 
operations may be performed very easily. Addition, subtraction and multiplication for 
example, may be performed component wise in modular arithmetic. Again, using the 
30 base of {3, 5, 7}: 

if: 1 = (1,1,1) and 
5 = (2,0,5), then 

1+5 = ((1 + 2) mod 3 , (1 + 0) mod 5, (1 + 5) mod 7) 
(0,1,6). 
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Of course, 1+5 = 6, and 6 in residue form with the same base is (0, 1 , 6). 
Subtraction and multiplication are performed in a corresponding manner. 

Heretofore, division had been thought to be impossible, but can be done 
advantageously in a manner of the invention. First, however, it is of assistance to 
5 review the method of solving for the residue numbers. 

Converting from an integer to a corresponding Residue Number is simply a 
matter of dividing by each number in the base set to determine the remainders. 
However, converting from a Residue Number back to the original integer is more 
difficult. The solution as presented by Knuth is as follows. Knuth also discusses and 
1 0 derives the general solution, which will not be presented here: 

For an integer "a" which may be represented by a vector of residue numbers 
(a f , a 2 , ... s k ): 

a = (a f c, + a 2 c 2 + ... + a k c k ) (mod n) (12) 

where: 

15 a, = • a(modn,) for /= 1, 2 k 

and: 

n = n 1 x n 2 * ... x n k 

and: 

c, = m, (at?/ 1 mod n,) for / = 1 , 2, k (1 3) 

20 and: 

m, = n I n, for/= 1, 2 k (14) 

and where the notation "(X" 1 mod y)" used above denotes that integer z such that 
xz (mod y) = 1 . For example, (3* 1 mod 7) = 5 because 15 (mod 7) = 1, where 
15 = 3x5. 

25 In the case of this example, with a base (3, 5, 7), a vector of solution 

constants, (c3 = 70, c5 = 21, c7 = 15), are calculated. Once these constants have 
been calculated, converting a residue number (1,1,1) back to the original integer is 
simply a matter of calculating: 

/>c,+ r 2 c 2 + r 3 c 3 =1x70 + 1x21 + 1x15 (15) 

30 = 106 

assuming a range of [0,104], multiples of 105 are subtracted yielding an integer 
value of 1 . 

Most texts like Knuth discuss Residue Numbers in the context of hardware 
implementation or high-precision integer arithmetic, so their focus is on how to pick a 
35 convenient base and how to convert into and out of that base. However, in applying 
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this technique to the invention, the concern is on how to easily create many diverse 
bases. 

In choosing a basis for Residue Numbers, quite a few magic coefficients may 
be generated dependent on the bases. By observation of the algebra, it is desirable 
5 to have different bases with a large number of common factors. This can be easily 
achieved by having a list of numbers which are pairwise relatively prime, and each 
base just partitions these numbers into the components. For example, consider the 
set {16, 9, 5, 7, 11, 13, 17, 19, 23}, comprising nine small positive integers which are 
either prime numbers or powers of prime numbers. One can obtain bases for 

1 0 residual encoding by taking any three distinct elements of this set. This keeps the 
numbers roughly the same size and allows a total range of 5,354,228,880 which is 
sufficient for 32 bits. For example, one such base generated in this manner might be 
{16 * 9 * 1 1, 5 * 13 * 23, 7 * 17 * 19} = {1584, 1495, 2261}. 

The invention allows a system of many bases with hidden conversion 

1 5 between those bases. As well, it allows the solution constants to be exposed without 
exposing the bases themselves. The original bases used to convert the software to 
residue numbers are not required to run the software, but would be required to 
decode the software back to the original high level source code. The invention 
allows a set of solution constants to be created which may run the software, without 

20 exposing the original bases. Therefore, the solution constants are of no assistance 
to the attacker in decoding the original software, or reverse engineering it. 

To hide the conversion of a residue number, r, defined by a vector of 
remainders (r 1f ... r n ) derived using a base of pairwise relatively prime numbers 
{b 1f bp ... />„), a vector of solution constants are derived as follows. Firstly, using the 

25 method of Knuth, a vector of constants (c v c 2 , ... c k ) may be determined which 
provides the original integer by the calculation: 

r = (r, c f + r 2 c 2 + ... + r k c k ) (mod b) (16) 
where b, is the rth number in the vector of pairwise relatively prime numbers {b 1 , b 2 , 
... b n }. As each of the corresponding r 1t r 2 , ... r n are residues, they will all be smaller 

30 than b„ therefore equation (16) may be simplified to: 

r } = (c, mod £>,) xr,+ (c 2 mod bj) * r 2 + ... + (c^mod b,) * r n (17) 
Each component (c, mod bj) will be a constant for a given basis, and can be pre- 
calculated and stored so that the residue numbers can be decoded, and the software 
executed, when required. Because the vector of (c, mod bj) factors are not relatively 

35 prime, they will have common factors. Therefore, the base {b 1t b 2i ... b n } can not be 



WO 02/095546 



PCT/CA02/00754 



-16- 

solved from knowledge of this set of factors. Therefore, storing this set of solution 
constants with the encoded software does not provide the attacker with any 
information about the old or the new bases. 

5 Division of Residue Numbers 

Most texts like Knuth also indicate that division is impossible. However, the 
invention provides a manner of division by a constant. 

In order to perform division by a constant using residue numbers, the divisor 
must be one of the numbers of the base: 
10 Let: the base be {b u b 2l ... 

the divisor be b„ which is a member of the set {b 1t b 2 , ... b n } t and 
the quotient be {q u q 2 , ... , gj. 
Then, to calculate q } (where / is not j): 

g, = (c y /b,mod6 y ) V y +(c,-1)/b,mod (19) 
1 5 The algebraic derivation is straightforward, by symbolically performing the full 

decoding and division. The key is the observation that all the other terms vanish due 
to the construction of the c/s. 

To calculate q h the terms do not vanish, so a computation must be made of: 
g, = (c, / 6, mod b f ) * r, + ... + (c„ / b, mod *>,) * r n (20) 
20 This equation does not take account of the range reduction needed, so a 

separate computation is used to calculate the number of times the range has been 
wrapped around, so that the proper value may be returned: 
iv, = [(c t /b/)*r f + ... 

+ (c„ / b t ) * r n ] I (rangeSize / fy) * (rangeSize / *>,) (21 ) 
25 Therefore, the decoded integer value becomes: 

x = q f + (rangeSize / b } ) * w, (22) 
Figure 4 presents a flow chart of a simple implementation of a Residue 
Number encoding phase. The routine begins at step 68 by establishing a base set 
of pairwise relative primes, for example, the set of {16, 9, 5, 7, 1 1, 13, 17, 19, 23} as 
30 presented above. At step 70, a base is computed from this set as previously 

described, such as {1584, 1495, 2261}. A suitable block of software code is selected 
from the SSA graph and is transformed into residual form at step 72. If operators 
are found which are not calculable in the residue domain, then they will be identified 
in the phase control file, and those operators and their associated variables will be 
35 encoded using a different technique. At step 74, a corresponding set of solution 
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constants is then calculated and is stored with the tamper-resistant program. As 
noted above, these solution constants are needed to execute the program, but do 
not provide the attacker with information needed to decode the tamper-resistant 
program. 

5 At step 76, a decision block determines whether the entire SSA graph has 

been traversed, and if not, the compiler steps incrementally to the next code 
fragment by means of step 78. At step 80, a determination is made whether to 
select a new basis from the set of pairwise relative primes by returning to step 70, or 
to continue with the same set by returning to step 72. Alternatively, one could return 
10 to step 68 to create a completely new base set, though this would not generally be 
necessary. 

Once the decision block at step 76 determines that the SSA graph has been 
traversed, the phase is complete. 

With this background, the reader may now consider the new analysis and 
1 5 tamper-resistant encoding techniques of the invention. 

New Analysis and Tamper-Resistant Encoding Techniques 

The first part of this section is devoted to measuring the resistance of data 
encodings to reverse engineering. We introduce a measure of encoding resistance 

20 in an encoded world as a measure of uncertainty: specifically, the number of 

possible solutions or Year worlds which could correspond to the observable encoded 
world. An attacker observing only operations in an encoded world and inputs to the 
encoded world (i.e., all encoded input data) cannot distinguish between any of 
possible solutions. Thus, the larger the number of corresponding possible solutions 

25 (i.e. the size of the "transform space"), the more uncertainty and resistance the 
encoding has. Only one of these possible solutions, of course, is the correct 
solution. 

In other words, there is a specific original computation to be encoded. It can 
be encoded according to one of a list of techniques, and for each technique, there 
30 are many different computations which might be encoded to exactly the same 

encoded computation (i.e., the computer instructions are exactly the same, but the 
significance of the computation varies - the meaning/encoding of the input and 
output values differs). The number of different computations which could lead to the 
same instruction sequence constitutes the ambiguity for that technique (an attacker 
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sees exactly the same thing, and then has the problem of resolving the ambiguity 
among the may possible meanings/encodings for the encoded computation). 

We choose encoding techniques which provide sufficient ambiguity, as 
defined above, to satisfy the security need for the computation to be encoded. 
5 It is important to note that the ambiguity measures characterize the 

resistance of encoding to an arbitrary attack which only uses information from the 
encoded world. 

We then present estimates of resistance of linear, residue and mixed 
encodings (i.e. the use of linear and residue encodings in combination) for addition 
10 and multiplication and demonstrate that maximal resistance is achieved for mixed 
encoding. We show that there exist more resistant schemes for performing 
multiplication in mixed encodings. 

We estimate resistance of computation of arbitrary multivariate polynomials in 
mixed encoding and propose several ways to increase the resistance of arbitrary 
1 5 computation in mixed encoding. 

1.0 Introduction: general scheme of using encodings in computations 

Below we present a brief overview on the problem of data encodings. 

20 1.1 What are Data Encodings? 

Suppose we wish to compute 

y = F(x n ,...,x 0 ;c 1 cj, (23) 

where: 

x 1f ... ,x n is the input data, 
25 c 1f ... , c m some internal parameters we wish to hide from an attacker 

by an encoding, 
y = (yi» ••• » Yt) is the output and 
F is a function for its computation. 
Encoding is a parametric collection of functions that map each integer into 
30 tuples of integers: 

xWaW) (24) 

C/, = fi(C y ) 

35 
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c'^m (25) 
An important additional requirement is that encoding must be consistent with 
arithmetic operations. This means that for each basic arithmetic operation (+, x, +) 
there is a sequence of a constant number of operations over the encoded data 
(which is called a replacing sequence) such that the original arithmetic operation can 
be derived from the result of replacing sequence of operations by a. simple decoding 
procedure. 

Then, instead of (23) we compute 

.K'f = F\ ( X Mf-i* nV-X n*I C' l1 ,...C'f/ f ,...C' m1l ...C ' mfc ) f 



where P are obtained from F by standard rules (using the encoding that is in 
consistent with the arithmetic operations used). 

Then we apply decoding (the inverse function to encoding) to obtain the 
1 5 original results of the computation. 

1.2 A general scheme of using encodings in computations 

The following general scheme may be used: 

A. Encodings of data. At this stage for all input data we compute their encoded 
20 values. 

B. Computations with encoded data (Encoded world). At this stage we compute 
with encoded data by formal rules replacing each basic operation by a 
sequence of appropriate operations. 

C. Decoding of results. At this stage encoded results obtained in computations 
25 are decoded. 

What are the main advantages of such scheme? Note that Stage B is obtained by 
an original program being cloaked by simple formal rules (automatically). 

Stages A and C compute concrete functions, so they can be implemented 
once (using identities or other techniques) for all programs being cloaked. 
30 If we are able to hide information on stages A and C by cloaking, then the 

resistance of the whole scheme will be determined by the resistance of the stage B. 
By this reasoning it is vital to define what 'resistance to computations' means for the 
encoded world (Stage B). 



35 
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1 .3 Measure of resistance of encodings 

We introduce a measure of encoding resistance in the encoded world as a 
measure of uncertainty. We define the resistance measure as the number of 
different possible solutions which can correspond to the observable encoded world. 
5 An attacker observing only operations in the encoded world and inputs to the 

encoded world (i.e., all encoded input data) cannot distinguish between any of such 
possible solutions corresponding to the same encoded world. Therefore, the greater 
the number of corresponding possible solutions, the greater the uncertainty and 
resistance of the encoding method. 
10 It is important to note that such a measure characterizes the resistance of 

encoding to an arbitrary attack (exhaustive search) which uses only information from 
the encoded world. It means that this measure characterizes absolute resistance. 

The aim of introducing measures of resistance is to compare the resistance 
of different encodings for computations. 

15 

2 Definition of a measure of encoding resistance 

We wish to compute (23) using the encodings in (24). 
The world of possible solutions is a tuple (c 1f ... F c^ x„...,x n ) and the encoded 
world is a tuple of the form: 
20 EncodWorld- (x x V.., x' n1 ,..., x'„ ft ; c c 

Definition. A measure of resistance of a scheme (24) - (26) is the number of 
different possible solutions (c 1f ..., x^..., x n ) which correspond to the same 
encoded world: 

25 EncodWorfd= (xV.., x' 1Jb .», x' rt1 ,..., x' nk \ c'^,..., c\ k ,... t 

c^,cWP, (28) 

We will denote it by R w = R w (EncodWorld). 

An equivalent definition of the measure of resistance of a scheme is as 

follows: 

30 Definition. A measure of resistance of the scheme (24) - (26) is the number 

of different possible solutions (c 1; ..., c h x 1f ..., x n ) which can correspond to the same 
encoded world, i.e. the number of different possible solutions (q,..., c k , x 1# ... f x n ) for 
which we can obtain the same encoded world using the same encoding scheme (the 
same class of encoding but with other possible encoding parameters). 
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Examples of estimates of resistance for linear, residue and mixed encodings 
(i.e. the simultaneous application of both linear and residue encodings) are 
presented in the following sections. 

5 3 Resistance of linear encoding 

3.1 Resistance of linear encoding of a sum: qx, + + c„x n 

First let us consider the resistance given by the sum of two variables x and y. 
z = c^x + Cay. 

1 0 Let integers x and y be represented in a linear encoding as 

x' = a^ • x + bi 

y'=a 2 -y+b 2 (29) 

and 

c'i = <VCi 

15 c' 2 = a 2 -c 2 (30) 

and 

A 2 - a 2 a 2 l m (31) 

where 

20 m = GCDfo a v a 2 a 2 ) (32) 

For calculating z in linear encoding we need the following relationship: 

z'=A 2 •c / 1 x / +>4 1 -c'tf' 
The observable world is determined by the following parameters: c\, c' 2 , x\ 
y ; A u A 2 and the number of possible solutions (let us denote it as R w ) is the number 
25 of solutions of the corresponding system of equations (29 - 32). 

The solution (i.e., one of the possible solutions R w ) is a set of values for x, y, 
c„ c 2f a v a 2t b v b 2 , or,, cr 2 . Let us denote the range of possible values as K. 
We now make the following Propositions. 

Proposition 1. For fixed c' 1f c\ x\y', A v A 21 and a 1f a 2 , c t , c 2 the number of 
30 possible solutions Rw > K 2 . 

Proof: The proof follows from the fact note that arbitrary values of x or y are 
solutions of our system since for any x (y) we can choose b 1 {b 2 ) such that the value 
of x'(yO does not change. 
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Proposition 2. For fixed c \ t c' 2 , x', y', A v A 2 and c \ t c' 2 , x, y the number of 
possible solutions can be estimated as Rw z K/A, where A is a range of variation of 
a, and a 2 . 

Proof: Note that for some solutions of our system a 1t a 2 and for any q the 
5 values £<, = q • a! and 3 2 = g * a 2 also give a solution because A i9 A 2 are the same 
and there exist b, and b 2 such that x' and y 'do not change. 

Proposition 3. The number of possible solutions is R w z KVA where A is a 
range of variation of a 1 and a 2 . 

Proof: This proposition follows immediately from Proposition 1 and 
10 Proposition 2. 

Now we can return to the question of resistance of the sum: c 1 x 1 + ... + cjc n . 
Basing on the results achieved above it follows that R w ;> K" +1 /A, where A is a range 
of variation of a,,..., a n . 

If K = 2 64 (the usual range for representing integers in Java) this gives a lower 
1 5 bound of resistance > 2 100 which seems large enough from the point of 

computational complexity (enumerating all possible solutions is impossible) and from 
probabilistic point of view (the probability to guess right parameters is less than 2" 100 ). 
This is based on doing an exhaustive search. 

20 3.2 Resistance of linear encoding of a product: x,...x n 

Now consider the resistance of the product of two variables x and y: z = x • y. 
Remembering equations (29) 

x'= a, -x + fc, 

y / =a 2 -y+6 2 

25 To find z in linear encoding one needs to calculate 

z^x'-y'-Vx'-^ -y' (33) 
which are related to z by the formula 

z'=fz + g (34) 

where 

30 f=a,-a 2 (35) 

and 

g = fvf> 2 (36) 
The observable (encoded) world is determined by the following parameters: 
&n b 2 , x', y 'and the number of possible solutions (let us denote it as R w ) is the 
35 number of solutions of the corresponding system of equations (29) and (34 - 36), 
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The solution (i.e., one of the possible solutions R w ) is a set of values for x, y, a v 
b v b 2 . 

From the equations and the observable data b 1 and b 2t it follows that a 1 1 (x'- 
b,) and a 2 |(y'- b 2 ) ; where "p | q" means that p divides q\ i.e., qf = m -p for some 
5 integer m. Then we have relations a y - x = x ' - fa 1 and a 2 * y = y ' - . Hence, the 
following statement holds: 

Proposition 4. The resistance R w for multiplication in linear encoding is equal 
to the product of the numbers of divisors of the integers x'- b^ and y'- b 2 . 

Note that there are situations when resistance of linear encoding is not 
10 enough, namely equal to 1 when integers x'- b^ and y'- b 2 are primes. In the latter 
case an attacker can find all of the parameters of the linear encoding. 



4. Resistance of residue encoding 

The residue encoding of integer x is 
15 x = x (mod p,) 

where p„ / = 1 f ... f k are coprime integers. 

The encoded (observable) world is (x\,..„ x;) ( while the world of possible 
solutions is (x, p v ..„ p k ). 

Let p = p f • ... • p k . The resistance of residue encoding can be estimated via 
20 the function S(p, /c), where S(x, /c) is the number of different representations of 
integer x as a product of k mutually coprime numbers. 
Proposition. 

R w * S(p, k). 

This estimation of R w comes from the evaluation by I. Niven and H. S. Zuckerman in: 
25 An Introduction to the Theory of Numbers, Wiley, 1980. 

5. Alternative mixed encoding: addition and multiplication 

In this section a more resistant method for performing addition and 
multiplication in a mixed encoding (i.e. a combination of linear and residue encoding) 
30 is presented. 

Let integers x and y be represented in mixed encodings as 
x\ = a y x A + b^ mod 

x' n = a n •x n + Z>„modp n (37) 

35 and 
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•y 1 + d 1 modp l 

y'n = c n -y n + d n modp n (38) 
and let GCD{a k , p k ) = GCD^ p k ) = 1 for all k = 1,..., n. 

5 

5.1 Addition 

Find A k and fj k satisfying the equations: 
A k -a k * 1 modp* 

/VC„= 1 modp* (39) 
1 0 For each k choose m k such that GCD(m fc , pj = 1 and then take two different 
representatives m k (1) and m k (2) of mod p k class of m k . Then denote: 

M k =m k v-p k (40) 
Let 2 = x + y and we are looking for the sum of two variables x and y. To find 
15 z in mixed encoding it is sufficient to calculate: 
z\ = A 1 -x\ -y\ modp, 

z;^ A„ -x;+M n -y;modp n (41) 
which are connected to z = (z v ..., z n ) by the formulas 
20 z^ = f, -z,+ g, modp. 



25 



where 



and 



z> f„- z„ + g„ mod p„ (42) 
f, = m 1 mod p 1 

f„ = m„ mod p„ (43) 
g 1 = A 1 • to! +M, • d, mod P! 



30 g n *A n -b n +M n -d n modp n (44) 

Observable values: an attacker observes only A„ M, and variables x', and y' t . 
Note that the following relations hold: 

GCD(a,<, p K ) = GCD(Ck, p K ) = 1 for all k = 1,..., n. (45) 
For each /c, GCD(m K , p K ) = 1 (46) 
35 m K (1) ^ m K « mod p K (47) 
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A K ■ a K a 1 mod p K (48) 

p K • c K s t mod p K (49) 

A K *m™-A k (50) 

M K *m K n-v K (51) 
5 x', s a, -x, + modp, 

x' fl = a n -x n + f) n modp n (52) 
y'i = <m -y, + di modp, 

10 y'„ s c n -y„ + d n mod p n (53) 



The world of possible solutions is (x, y, a„ b k d<, p,). 

5.2 Multiplication 

Now consider the multiplication of two encoded data elements: 
15 x = a, • Xi + bi mod p, 

x' n = a„ -x n + 6 n modp n (54) 

and 

y'i £ Ci -yi + ^modp! 

20 

y' n s c n -y n + d n mod p„ (55) 
and let GCD(a ta p„) = GCD(c fc p k ) = 1 for all k = 1,..., n. Then there exist such A k and 
p/c that A k • a k = 1 mod p„ and p„ • c k = 1 mod p k . Then /\* • x "„ - \ • b k = x k mod p k 
and p* • y ; - p* • <4 = y k mod p,. 
25 We have: 

x k -y k = (A k -x' k -A k -b k ) -(p k -y' k -p k -d k ) 

= A k - p k -x' k -y\ -A k ■ pA -Y'k ~K 'MA -x' k + A k -pA • of*- 
Multiplying the both sides of the last equation by some 9 k * 0 mod p k we obtain 

0* •** -y* = 0* -4 -p* •/* - A* -x; -0* A -pA -*; + 0* -4 • 

30 pA-o* 
or 

0*'^-y* - 9 k -A k -p k b k -d k =e k -A k -p k -x' k -y' k -9 k -A k -p k b k -y' k -9 k -A k - 
vA-x'<< 

Then choose different representatives of p to 0* and denote 
35 6 k -A k -p k B a k modp k , 
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Q k'Ak'Vk'bk* ft mod p k , 
Qk'K'Uk'dk^YkWodp* 
9k'Ak'Vk't>i -cf t E5 fc mod p, 
Then we get a formula for multiplication of integers represented in mixed encoding: 
5 0* a* "ft -fc-*;modp. 

Observable values: (x", y', or*, ft /,). 
The world of possible solutions is (x, y, a* 6* c A p,). 



6 Resistance of alternative mixed encoding 

10 

6.1 Resistance of a sum A 1 x 1 + . . . + A„x n 

Firstly let us consider the resistance of the sum of two variables x and y. z = x 



Let integers x and y be represented in mixed encoding as 
15 x'^a^x^ b A mod p. 



20 



and 



x' n ^a n -x n + 6 n modp n (56) 
y' t = c, -y, + d f rnodp, 

y; = c n -y^ + ckmodA, (57) 



and let 



GCD(a„p,) = GCD(c„p,) = 1 (58) 
for/f = 1, ... , /}. 
25 Find \ and p fc satisfying equations 

A k -a k = 1 mod p^-c^ 1 modp* (59) 
For each /ewe choose m k (1) and rtV 2) such that: 

GCD(m ™, p k ) = GCD(m*< 2 > p k ) = 1 (60) 

and 

30 • m k < 2) mod p k (61) 

Now choose: 

A modp, 

M k = m k {2) -p k modp k (62) 
In a mixed encoding for finding the encoded z we will calculate: 
35 z\ = N y -x\ + M, -y\ 
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z; = A,'x; + M n 'y; (63) 
Earlier we introduced a measure of encoding resistance in the encoded world 
as the number of possible solutions which can correspond to the observable 
5 encoded world. Note that in this case the observable world consists of the following 
parameters: A k , M k , x' k , y' k (k = 1,..., n) and the number of possible solutions (let us 
denote it as R w ) is the number of solutions of the corresponding system of equations 
(56 - 63). The solution (i.e., one of the possible solutions R w ) is a set of values for x kl 
y k , a*, b h p k (k = 1, ... , n). Now we will estimate R w . 
10 Proposition 1. For fixed A^, M h x' k , y'*and a k , p k (/c= 1,..., n) the number of 

possible solutions R w z p 2 , where p = p A • p 2 • ... • p„. 

Proof: To prove this we note that any values of x k or y k (0, 1, 2,..., p k - 1) will 
give a solution of our system as for any x k {y k ) we can choose such b k (d k ) that the 
value of x' k (y \ ) does not change. 
1 5 Proposition 2. For fixed x % y' k and x k , y k (k = 1,..., n) the number of possible 

solutions is R w > 2 2/ \ where p = p 1 -p 2 - ... -p n . 

Proof: It is known that A k = and M k = m fc (2) * p*. One can check that 

there exists such solutions: 

(m^m^U k ,^) = (1,1,^H); 

20 ^^4tt) s a«l^i); 

«> ^p,) = (^ 1,1, M*); 
(m,^,/77; 2 U.,P,) = (\,^t1). 

Proposition 3. The number of possible solutions is /?„ * 2 2n -p 2 . 
Proo/; This proposition is a corollary of Proposition 1 and 2. 
25 Now we can state a more general variant of the second proposition: 

Proposition 4. For constant x' h y' k and x k , y k (k = 1,..., n) for any A^ (1) , A k {2) 
(M k {1 \ M fc {2) ) such that A fr = A* (1) • A k {2) (M k = ■ M„ (2) ) there exists a solution of our 
system such that m,< 1) = A, (1) and A k = A k < 2) {m k {2) = M k ™ and p, = M^ (2) ). 

As we can see, the greater the number of divisors of A k and M k is, the greater 
30 is the number of possible solutions. Note that during these computations we can 

choose m\ and m\ . So using m\ and m 2 k with large numbers of divisors can 

greatly increase the resistance of the computations. If for any k the number of 
divisors of A k and M k is at least q then R w z cf n - p 2 . 
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Now returning to the question of resistance of the sum: >A 1 x 1 + ... +A m x m . 
Based on the results obtained above it is possible to prove that R w ^ 2*" • p m . 

Some improvement of the statement claimed in Proposition 2 is possible; 
indeed, it is possible to prove that R w * (n! + 2 2n - 1) -p 2 . 
5 Proof: It is known that A* = m k iV - A k and M k = m k i2) - p k . One may check that 

there exists solutions: 

(m^,^ 2 U fc> p,) = (1 f 1,A»M fc ); 
(^^4ft) = (1,W,A,1); 
(m,™ m*< 2 > ^p*) = (A„1, 1 f A4); 

We note that in the case (m k (1) , m k (2 \ A* p k ) = (1,1, A k , M k ) } the number of 
solutions R w z nl, in this case we have A k = A* and p k = M k , thus, A* • a* = 1 mod p k 
and M^c^1 mod p*. 

Let us propose that p = p, -p 2 - ... *p„ and p is fixed and the value of its 
5 divisors is also fixed but not the place (i.e., /). 

This means that we have n\ variants of the order for p„ and for each variant 
we have one more solution. 

6.2 Resistance of a product y1 a1 ... y? 

First consider the resistance of the product of two variables x and yz = x • y. 
Recall the representation in mixed encoding per equations (56 - 57): 
x\ = a y -x y + ^ modp! 

x' n = a n -x n + b n mod p n 

and 

y\ = c, •y 1 + of 1 modp 1 

y'n = c n + mod p n 
and let GCD(a k , p k ) = GCD{c k , p k ) = 1 for all k = 1, ... , n. Then there exist such A k 
and that \ • a k a 1 modp^and p* s 1 mod p*. Then \ -x; -b k = x* mod 
p* and p k -y' k -p k -d k *y k mod p k . 
Then we have: 

**-y* *(A k >x' k -A k >b k )-(jj k -y' k -iJ k -d k ) 

s \ ■//* -y'/c ^A'x;+^ j>A 

Multiply the both sides of the last equation by some 6 k # 0 mod p k . Then we have: 
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e* -*k -y k s o k A x\ y'k -0* -K -mA *y* -o* A -mA •* * -mA A 

or 

e*-**'y* -©jrVA'A A 5 0* A-m^W'* - WwA-y* - 

MA 

5 Then choose different representations of p*, 6 k and denote: 
e*-V/V<4 = Y* and 

0/c A 'Wr A A = 5* 

10 Thus, we obtain: 

0* x fc -y, or* -x' k -y' k -p k -y' k -\ k x' k 

The observable world in this case is determined by the following parameters: 
(flt A» Y*) for all /c = 1, ... , m. 

Proposition 5. For fixed or* y*, * *, y' k and /),,, <4 (/c = 1, ... , n) the number 
1 5 of possible solutions R w ;> q> 2 (p), where p = p A -p 2 -... • p n and <p(/V) is the Euler 
function which is the number of positive integers coprime with (positive integer) N 
and less then N. 

Proof. From definition of multiplication in mixed encoding the following 
equations hold 
20 x\ -b k = a k x k mod p k 

y\ ~d k = c k y k mod p k (64) 

and 

cr* = 6 k a k ' c k A mod p k 
& s a k b k modp k 

25 ft = a*d*modp* (65) 

We can choose g?(p k ) solutions of equations a k x k = q k mod p k and c k y k = q \ 
mod p k for fixed q k and q \ taking arbitrary a k and which are coprime to p k . Then 
there exists ^(p*) solutions of equations (64 - 65) for any fixed a k , x' k , y' k and 
b h d k . As p k are mutually coprime and the Euler function is multiplicative, the 

30 number of all possible solutions for all p lf ... , p n then is (<p(p A ) ~- <p(p n )) 2 = <I?{P\ -p n ) 
= flflp). 

Returning to the question of resistance of the product of independent 
variables: x v ..x m . Based on the results above, it can be shown that R w * <^(p), 
where p = p 1 -p 2 .... -p„ 
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To avoid technical difficulties in estimating resistance of the product y, a1 * ... - 
y f rt we introduce multiple encodings of the same input data. This means that every 
new appearance of input variable will be encoded with independent parameters as a 
new variable. For example, in the formula y = x 2 , variable x will be encoded twice 
with different parameters and we can consider the encoded formula as the product 
of independent variables. 

In this way, we reduce the problem of estimating the resistance of y, a1 • ... • 
yj* to estimating the resistance of the product of independent variables z t - ... • z T , 



where T = J] or,, and obtain an estimate of the resistance R w of the product y^ 1 * ... 

;'=1 

10 • y*as: 

R w ± V T {p) 

6.3 Resistance of polynomials of several variables ("multinomials") 

In this section we estimate resistance of general computations with integers 
1 5 containing both additions and multiplications. 

Let us consider a multivariate polynomial 

x n )= 2 

and assume that we wish to hide some of its coefficients from an attacker by mixed 
encoding. Let N be the number of summands in the above formula. 
20 Let us consider the following way to compute a polynomial P: 

1 . in each monomial compute all multiplications of variables; 

2. multiply them by corresponding coefficients c a ; 

3. compute P performing N additions of obtained results. 

We again use multiple encodings of the same input data: every new 
25 appearance of each input variable will be encoded with independent parameters as a 
new variable. This reduces the problem of estimating resistance of a polynomial P to 
the problem of estimating resistance of the sum of variables: 
c 1 y, + ... + c J ^ 

where N is the number of summands in the formula for P, each y, is the product of 
30 independent variables and sets of such variables are disjoint for any pair y„ y y . 
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In a manner similar to that used to obtain the results for resistance of the 
product of independent variables (Section 6.2) and the results for resistance of linear 
combination of independent variables (Section 6.1) in mixed encoding we, can prove 
the following: 

Proposition. The resistance R w of computing P(x v ... , xj can be estimated 

as: 



where S = > m r W, m, is the number of independent variables in the /th 



summand, N is the number of summands and cp(n) is the Euler function. 

7. Exemplary Implementations 

General implementations of the invention will now be presented. 
Broadly speaking, the analysis technique of the invention could be applied for 
successive fragments of code by performing the following steps: 

1 . first, computing the ambiguity measures for various encodings; 

2. then selecting the particular encoding the an ambiguity level sufficient to 
meet the security needs; and then 

3. applying the selected encoding to the code fragment. 

The same steps are then performed for the remaining fragments in the targeted set 
of code, to arrive at a tamper-resistant set of code. 

Clearly, the analysis system of the invention could easily be incorporated into 
a routine such as that of Figure 2 simply by adding some further steps. Such a 
routine is presented in the flow chart of Figure 5. 

This routine begins at step 30 by converting the targeted software code to 
SSA form using a compiler front end. The parameters which define the bounds of 
possible encodings are then established at steps 100 and 102. While the discussion 
of the invention has focussed mainly on the effectiveness of the various encoding 
techniques, there are other considerations that may affect which technique is used in 
a certain application. For example, it may be desirable to consider: 

4. the degree to which different encodings cause code expansion; and 

5. the increased processing burden. 

Certain parameters will be set as a matter of user preference, while others 
will be limited by the platform or application (such as the bit width of an operating 



R w * ^(P) 



N 
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system or Smart card, or the allowable processing time to maintain realtime 
performance). These parameters would be established at steps 100 and 102 
respectively. 

Next, the routine considers each phase of the SSA code at step 38, as 
5 described above. For each phase, the routine walks the SSA graph at step 104, 
collecting the data necessary to effect the proposed encoding, and also to perform 
the effectiveness calculations. 

The effectiveness calculations are then performed at step 106, and an 
optimal encoding is identified. As noted above, the selection of an optimal encoding 
1 0 may turn on a number of factors in addition to the overall effectiveness. 

Steps 42, 44 and 46 are then performed as described above, affecting the 
optimal encoding on the targeted software. 

Once all phase have been encoded, the SSA graph is converted to 
executable object code by means of a compiler back end, at step 48, completing the 
1 5 routine. 

In section 6.3, we analysed the resistance of polynomial equations with 
several variables. These multinomials occur commonly in computations underlying 
many applications, such as: 

task or job-shop scheduling problems involving areas or volumes to be 
20 processed with fixed personnel (such as installing floor tiles); 

bank interest calculations for fixed numbers of interest intervals; 

curve-fitting, where we try to find a formula which fits observed data; 

ballistics; 

computer graphics; and 
25 • approximations for many other kinds of computations, where we use the 
multinomial approximation to avoid a much more expensive, precise 
calculation. 

Multinomial encoding can be applied to any of the above because a 
multinomial encoding of a multinomial is itself a multinomial; i.e., we replace an 
30 unencoded multinomial with an encoded one. 

Since ordinary additions, subtractions, multiplications, and exponentiations, 
are just very simple instances of multinomials, multinomial encoding can be applied 
to such ordinary computations as well: the multinomial encoding of a multinomial 
(however simple) is itself a multinomial. 
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Wherever polynomials of one or several variables occur in computations, we 
can apply the multinomial technique, and compute its ambiguity using the formula 
given in section 6.3 above. 

Figure 6 presents a flow chart of simple implementation of an algorithm for 
5 effecting the multinomial encoding technique. This routine is much like that of 
Figure 3, described above. First, at step 120, a fragment of code from the SSA 
graph is analysed to determine whether it defines a multinomial equation suitable for 
multinomial encoding. If so, a suitable set of multinomial equations are defined at 
step 122 that accomplishes the desired encoding. 
10 Like the case of the polynomial encoding, the values of constants are 

generally unrestricted, the main concern being that the constants are smaller enough 
to avoid overflow. Thus, the values of constants in the encoding equations may be 
selected randomly at step 62, within the allowable constraints of the program. At 
the decision block of step 64 it is then determined whether the entire SSA graph has 
1 5 been traversed, and if not, the compiler steps incrementally to the next code 
fragment by means of step 66. Otherwise, the phase is complete. 

Variations on this technique would be clear to one skilled in the art. 

Similarly, the "alternative mixed encoding technique" described in section 5 
can also be implemented using a routine similar to that presented in Figures 3 and 
20 6. Such a routine is presented in the flow chart of Figure 7. 

First, at step 140, a fragment of code from the SSA graph is analysed to 
determine whether it performs integer addition, subtraction of multiplication. If so, a 
suitable set of mixed encodings are defined at step 142, where, for the inputs and 
the output in each of which all linear multipliers are coprime to all moduli. 
25 Like the other encodings described above, the values of constants in the 

encoding equations are then randomly selected at step 62, within the allowable 
constraints of the program. 

At the decision block of step 64 it is then determined whether the entire SSA 
graph has been traversed, and if not, the compiler steps incrementally to the next 
30 code fragment by means of step 66. Otherwise, the phase is complete. 

Variations on this technique would be clear to one skilled in the art. 



35 



8. Summary and Future Work 

This report is devoted to measures of the resistance of data encodings. We 
introduced a measure of encoding resistance in the encoded world as a measure of 
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uncertainty, that is, the number of possible solutions which can correspond to the 
observable encoded world. An attacker observing only operations in encoded world 
and inputs to encoded wprld (i.e., all encoded input data) cannot distinguish between 
any of the possible solutions, hence, the larger the number of corresponding 
5 possible solutions, the more uncertainty and resistance of encoding. 

It is important to note that such measures characterize the resistance of 
encoding to an arbitrary attack (exhaustive search) which uses only information from 
the encoded world. 

We have presented estimates of resistance of linear, residue and mixed 
10 encodings for addition and multiplication, and shown that the maximal resistance is 
obtained with mixed encoding. More resistant schemes for performing multiplication 
in mixed encoding have also been shown. 

We have estimated the resistance of computation of arbitrary multivariate 
polynomial in mixed encoding and proposed several ways to increase the resistance 
1 5 of arbitrary computations in mixed encoding. 

This report is preliminary and there are many possibilities for technical 
improvements for some of the estimates presented. 

The following observations will aid in the performance of future work: 

1 . Dependence of resistance on the algorithm 

20 Resistance can depend on the algorithm, so it is desirable to find the most 

resistant schemes of computations. 

2. Multiple encodings 

It is of interest to determine how to increase resistance of computations when 
some variable x appears several times in a computation. One possible way is to use 
25 multiple encodings for x. We have used this method to estimate the resistance of 
arbitrary multivariate polynomial in mixed encoding. 

Wide Applications 

Tamper-resistant encoding in a manner of the invention has very wide 
30 possible uses: 

1 . Protecting the innovation of a software algorithm. For example, if one wished 
to sell software containing a new and faster algorithm to solve the linear 
programming problem, one would like to sell the software without disclosing 
the method. 
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2. Protecting the innovation of a software model. In hardware design, it is 

common for vendors of ASIC cell libraries to provide precise software models 
so that users can perform accurate system simulations. However, it would be 
desirable to do so without giving away the actual cell design. 
5 3. Wrapping behaviour together. Often, it is desirable to write some software 
that will perform a function "A" if and only if an event "B n occurs. For 
example, a certain function is performed only if payment is made. 

4. Hiding secrets, such as adding encryption keys or electronic signatures into a 
program, so that the program can sign things and encrypt/decrypt things, 
10 without leaking the key. 

Clearly, there are other applications and combinations of. applications. For 

example, an electronic key could be included in a decoder program and the 

decoding tied to electronic payment, thereby providing an electronic commerce 

solution. 

15 

While particular embodiments of the present invention have been shown and 
described, it is clear that changes and modifications may be made to such 
embodiments without departing from the true scope and spirit of the invention. 

It is understood that as de-compiling and debugging tools become more and 

20 more powerful, the degree to which the techniques of the invention must be applied 
to ensure tamper protection, will also rise. As well, the concern for system resources 
may also be reduced over time as the cost and speed of computer execution and 
memory storage capacity continue to improve. 

These improvements in system resources will also increase the attacker's 

25 ability to overcome the simpler tamper-resistance techniques included in the scope 
of the claims. It is understood, therefore, that the utility of some of the simpler 
encoding techniques that fall within the scope of the claims, may correspondingly 
decrease over time. That is, just as in the world of cryptography, increasing key- 
lengths become necessary over time in order to provide a given level of protection, 

30 so in the world of the instant invention, increasing complexity of encoding will 
become necessary to achieve a given level of protection. 

As noted above, it is also understood that computer control and software is 
becoming more and more common. It is understood that software encoded in the 
manner of the invention is not limited to the applications described, but may be 

35 applied to any manner of the software stored, or executing. 
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The method steps of the invention may be embodiment in sets of executable 
machine code stored in a variety of formats such as object code or source code. 
Such code is described generically herein as programming code, or a computer 
program for simplification. Clearly, the executable machine code may be integrated 
5 with the code of other programs, implemented as subroutines, by external program 
calls or by other techniques as known in the art. 

The embodiments of the invention may be executed by a computer processor 
or similar device programmed in the manner of method steps, or may be executed 
by an electronic system which is provided with means for executing these steps. 

10 Similarly, an electronic memory means such computer diskettes, CD-Roms, Random 
Access Memory (RAM), Read Only Memory (ROM) or similar computer software 
storage media known in the art, may be programmed to execute such method steps. 
As well, electronic signals representing these method steps may also be transmitted 
via a communication network. 

1 5 It would also be clear to one skilled in the art that this invention need not be 

limited to the existing scope of computers and computer systems. Credit, debit, 
bank and smart cards could be encoded to apply the invention to their respective 
applications. An electronic commerce system in a manner of the invention could for 
example, be applied to parking meters, vending machines, pay telephones, inventory 

20 control or rental cars and using magnetic strips or electronic circuits to store the 

software and passwords. Again, such implementations would be clear to one skilled 
in the art, and do not take away from the invention. 
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WHAT IS CLAIMED IS: 

1 . A method of increasing the tamper-resistance and obscurity of computer 
software code comprising the steps of: 

proposing a set of possible encoding techniques; 

calculating the number of possible solutions that would correspond to each of said 

set of possible encoding techniques; and 
encoding said target program using the encoding technique that results in the 

greatest number of possible solutions. 

2. A method of increasing the tamper-resistance and obscurity of computer 
software code comprising the steps of: 

selecting an original computation; 

proposing a set of possible encoding techniques which could be applied to said 
original computation; 

for each of said possible encoding techniques, calculating the number of distinct 
alternative computations which could be encoded to produce exactly said 
encoded computation using said possible encoding technique, said number 
of distinct alternative computations constituting the degree of ambiguity of 
said possible encoding technique for said original computation; and 

encoding said original computation with the encoding technique which provides a 
sufficiently large degree of ambiguity to satisfy a specified security 
requirement for said original computation. 

3. . The method of claim 1 , wherein said step of calculating comprises the step 

of: 

calculating the number of possible solutions that would correspond to each of said 
set of possible encoding techniques for the linear encoding of a sum: R w * 
K" +1 /A, where: 

R w is the number of possible corresponding solutions; 
K is the range of variables in the system; 

n is the number of encoded variables which are summed together in the 

linear encoding; and 
A is a range of variation of a,,..., a n . 
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4. The method of claim 1 , wherein said step of calculating comprises the step 
of: 

calculating the number of possible solutions that would correspond to each of said 
set of possible encoding techniques for the linear encoding of a product: R w 
equal to the product of the numbers of divisors of the integers x'- b n and y'- 
6 2l where: 

R w is the number of possible corresponding solutions; 
x' = a 1 -x + and 
y'=a 2 -y + b 2 . 

5. The method of claim 1 , wherein said step of calculating comprises the step 
of: 

calculating the number of possible solutions that would correspond to each of said 
set of possible encoding techniques for a residue encoding: R w z S(p, /c), 
where: 

R w is the number of possible corresponding solutions; 
P = Pt— P* and 

S (x, k) is the number of different representations of integer x as a product of 
k mutually coprime numbers. 

6. The method of claim 1 , wherein said step of calculating comprises the step 
of: 

calculating the number of possible solutions that would correspond to each of said 
set of possible encoding techniques for the residue encoding of a sum: R w z 
(nl + ZP-l) -p 2 , where: 

R w is the number of possible corresponding solutions; 
P = P/ 'P 2 'P*;and 

n is the number of elements that are being summed together. 

7. The method of claim 1, wherein said step of calculating comprises the step 
of: 

calculating the number of possible solutions that would correspond to each of said 
set of possible encoding techniques for the residue encoding of a product: 
R w > <p T (p), where: 

R w is the number of possible corresponding solutions; 
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P = Pf 'Pi 'Pn\ 

<p(N) is the Euler function which is the number of positive integers coprime 
with (positive integer) N and less then A/; and 



8. The method of claim 1 , wherein said step of calculating comprises the step 
of: 

calculating the number of possible solutions that would correspond to each of said 
set of possible encoding techniques for encoding using polynomials of 
several variables: 
R w * (P), where: 

R w is the number of possible corresponding solutions; 
P = Pi'Pz 'Pn> 

<p(N) is the Euler function which is the number of positive integers coprime 
with (positive integer) N and less then /V; and 

N 

S = ^ m 7 -A/, m } is the number of independent variables in the /th 

summand, and N is the number of summands. 

9. A method of measuring encoding resistance in the encoded world as a 
measure of uncertainty. 

10. A method of measuring encoding resistance as the number of worlds which 
can correspond to the observed encoded world. 

11. A method of increasing the tamper-resistance and obscurity of computer 
software code comprising the steps of: 

responding to a code fragment defining an integer addition, subtraction or 
multiplication operation by: 

encoding said integer addition, subtraction or multiplication operation using a 
combination of linear and residue number encoding techniques; and 
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selecting random values of constants in said new linear and residue number 
encoding equations. 

12. A method of increasing the tamper-resistance and obscurity of computer 
software code comprising the steps of: 

responding to a code fragment defining a multinomial equation by: 

encoding said multinomial equation using a multinomial encoding technique; 
and 

selecting random values of constants in said new multinomial encoded 
equation. 

1 3. A method of data encoding comprising the step of manipulating factors to 
maximize the number of possible solutions. 

14. A system for executing the method of any one of claims 1-13. 

15. An apparatus for executing the method of any one of claims 1-13. 

16. A computer readable memory medium for storing software code executable 
to perform the method of any one of claims 1-13. 

1 7. A carrier signal incorporating software code executable to perform the 
method of any one of claims 1-13. 

18. A data structure comprising the output data of any one of claims 1-13. 



WO 02/095546 



PCT/CA02/00754 



1/7 




10 

J 




26 



> MODEM 



FIGURE 1 



WO 02/095546 PCT/CA02/00754 



2/7 



(SOURCE A 
CODE J 



FIGURE 2 



30 



COMPILER FRONT 
END 



J 



OPTIMIZE 
INTERMEDIATE CODE 



36 



J 



44 

J 



46 



WALK S.S.A. GRAPH AND 
PERFORM CODING CHANGES 



YES 



INSERT ANY 
REQUIRED 
RE-CODING 
OPERATIONS 





NEXT 
PHASE 



WALK S.S.A. GRAPH TO 
DETERMINE CODING 
CHANGES 



40 

J 



COMPILER BACK 
END 



TAMPER - RESISTANT 
OBJECT CODE 



WO 02/095546 



PCT/CA02/00754 



3/7 



FIGURE 3 



UNPROTECTED 
CODE 



58 



DOES A CODE 
FRAGMENT DEFINE 
A POLYNOMIAL 
EQUATION? 



NO 



YES 

DEFINE A SET OF FIRST ORDER 
POLYNOMIAL EQUATIONS THAT 
REMOVES THE ORIGINAL 
ALGEBRAIC OPERATION 



60 



SELECT RANDOM VALUES 
FOR CONSTANTS IN NEW 
EQUATIONS 



62 

J 




NO 



r 



66 



MOVE TO NEXT 
CODE FRAGMENT 



YES 



TAMPER-RESISTANT 
CODE 



WO 02/095546 PCT/CAO 2/00 754 



4/7 



UNPROTECTED 
CODE 



FIGURE 4 



DEFINE A BASE SET OF 
PAIRWISE RELATIVE PRIMES 



68 

J 



SELECT A RESIDUAL BASIS 
FROM THE SET OF PAIRWISE 
RELATIVE PRIMES 



70 

J 



YES 



NO 



SELECT A CODE FRAGMENT 
AND TRANSFORM USING 
RESIDUAL BASIS 



72 




CALCULATE A CORRESPONDING SET 

OF EXECUTION CONSTANTS AND 
STORE WITH TRANSFORMED CODE 



74 



J 




NO 



78 



MOVE TO NEXT 
CODE FRAGMENT 



YES 



TAMPER - 
RESISTANT CODE 



WO 02/095546 



PCT/CA02/00754 



SOURCE \ 
CODE J 



5/7 



COMPILER FRONT 
END 



30 

J 



FIGURE 5 



ESTABLISH USER 
PREFERENCES 



100 



J 



ESTABLISH SYSTEM 
LIMITATIONS 



102 



J 



44 

J 



WALK S.S.A. GRAPH AND 

PERFORM CODING CHANGES 




46 



YES 



INSERT ANY 
REQUIRED 
RE-CODING 
OPERATIONS 




106 




IDENTIFY OPTIMAL 
ENCODING 



NEXT 
PHASE 



I 



WALK S.SA GRAPH TO 
DETERMINE PROPOSED 
CODING CHANGES 



104 



COMPILER BACK 
END 



TAMPER -RESISTANT 
OBJECT CODE 



o 



WO 02/095546 



PCT/CA02/00754 



6/7 



FIGURE 6 




DEFINE A SET OF MULTINOMIAL 
EQUATIONS THAT REMOVE THE 
ORIGINAL ALGEBRAIC 
OPERATION 



I 



J 



SELECT RANDOM VALUES 
FOR CONSTANTS IN NEW 
EQUATIONS 



62 




NO 



r 



66 



MOVE TO NEXT 
CODE FRAGMENT 




TAMPER-RESISTANT \ 
CODE J 



WO 02/095546 



PCT/CA02/00754 



7/7 



FIGURE 7 




PERFORM THE OPERATION USING 
MIXED ENCODINGS FOR THE INPUTS 
AND THE OUTPUT IN EACH OF WHICH, 
ALL LINEAR MULTIPLIERS ARE 
COPRIME TO ALL MODULI 



J 



SELECT RANDOM VALUES 
FOR CONSTANTS IN NEW 
EQUATIONS 



62 



J 




NO 



r 



66 



MOVE TO NEXT 
CODE FRAGMENT 



YES 



/ TAMPER-RESISTANT A 
CODE J 



